function name
- North America > United States > Massachusetts > Suffolk County > Boston (0.04)
- Asia > Myanmar > Tanintharyi Region > Dawei (0.04)
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- North America > Canada > Quebec > Montreal (0.04)
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
- (2 more...)
NALA_MAINZ at BLP-2025 Task 2: A Multi-agent Approach for Bangla Instruction to Python Code Generation
Saadi, Hossain Shaikh, Alam, Faria, Sanz-Guerrero, Mario, Bui, Minh Duc, Mager, Manuel, von der Wense, Katharina
This paper presents JGU Mainz's winning system for the BLP-2025 Shared Task on Code Generation from Bangla Instructions. We propose a multi-agent-based pipeline. First, a code-generation agent produces an initial solution from the input instruction. The candidate program is then executed against the provided unit tests (pytest-style, assert-based). Only the failing cases are forwarded to a debugger agent, which reruns the tests, extracts error traces, and, conditioning on the error messages, the current program, and the relevant test cases, generates a revised solution. Using this approach, our submission achieved first place in the shared task with a $Pass@1$ score of 95.4. We also make our code public.
- Europe > Germany > Rheinland-Pfalz > Mainz (0.61)
- North America > United States > New Mexico > Bernalillo County > Albuquerque (0.04)
- North America > United States > Colorado > Boulder County > Boulder (0.04)
- Europe > Germany > Saarland (0.04)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Automatic Programming (0.87)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.53)
- North America > United States > New York > New York County > New York City (0.04)
- North America > United States > Massachusetts > Suffolk County > Boston (0.04)
- Asia > Myanmar > Tanintharyi Region > Dawei (0.04)
- Information Technology > Security & Privacy (1.00)
- Government (0.67)
- North America > United States > California > San Francisco County > San Francisco (0.14)
- North America > United States > New York > New York County > New York City (0.05)
- North America > United States > Texas > Travis County > Austin (0.04)
- (12 more...)
- Information Technology > Software (1.00)
- Information Technology > Security & Privacy (1.00)
- Information Technology > Data Science > Data Mining (1.00)
- (3 more...)
Anchor Function
Figure 7: Actual example of how an anchor function impacts the generated solution. In this section, we provide additional experimental details and results for the experiments in Section 3. We include additional details for anchoring (Appendix A.1), the availability heuristic (Appendix A.3), Filtering prompts for longer canonical solutions. However, all components of the prompts from Section 3.3.2 We plot the analogous add-var results in Figure 10 and include full numerical results in Table 7. In this section, we augment Section 3.3.3
- North America > United States > California > Alameda County > Berkeley (0.14)
- North America > United States > Illinois > Cook County > Chicago (0.05)
- North America > United States > California > San Francisco County > San Francisco (0.04)
Tool Calling for Arabic LLMs: Data Strategies and Instruction Tuning
Ersoy, Asim, Altinisik, Enes, Sencar, Husrev Taha, Darwish, Kareem
Tool calling is a critical capability that allows Large Language Models (LLMs) to interact with external systems, significantly expanding their utility. However, research and resources for tool calling are predominantly English-centric, leaving a gap in our understanding of how to enable this functionality for other languages, such as Arabic. This paper investigates three key research questions: (1) the necessity of in-language (Arabic) tool-calling data versus relying on cross-lingual transfer, (2) the effect of general-purpose instruction tuning on tool-calling performance, and (3) the value of fine-tuning on specific, high-priority tools. To address these questions, we conduct extensive experiments using base and post-trained variants of an open-weight Arabic LLM. To enable this study, we bridge the resource gap by translating and adapting two open-source tool-calling datasets into Arabic. Our findings provide crucial insights into the optimal strategies for developing robust tool-augmented agents for Arabic.
Cyber-Zero: Training Cybersecurity Agents without Runtime
Zhuo, Terry Yue, Wang, Dingmin, Ding, Hantian, Kumar, Varun, Wang, Zijian
Large Language Models (LLMs) have achieved remarkable success in software engineering tasks when trained with executable runtime environments, particularly in resolving GitHub issues. However, such runtime environments are often unavailable in other domains, especially cybersecurity, where challenge configurations and execution contexts are ephemeral or restricted. We present Cyber-Zero, the first runtime-free framework for synthesizing high-quality agent trajectories to train cybersecurity LLMs. Cyber-Zero leverages publicly available CTF writeups and employs persona-driven LLM simulation to reverse-engineer runtime behaviors and generate realistic, long-horizon interaction sequences without actual environments. Using trajectories synthesized by Cyber-Zero, we train LLM-based agents that achieve up to 13.1% absolute performance gains over baseline models on three prominent CTF benchmarks: InterCode-CTF, NYU CTF Bench, and Cybench. Our best model, Cyber-Zero-32B, establishes new state-of-the-art performance among open-weight models, matching the capabilities of proprietary systems like DeepSeek-V3-0324 and Claude-3.5-Sonnet while offering superior cost-effectiveness, and demonstrating that runtime-free trajectory synthesis can effectively democratize the development of state-of-the-art cybersecurity agents.
- Europe > Switzerland > Basel-City > Basel (0.04)
- Europe > Slovenia > Drava > Municipality of Benedikt > Benedikt (0.04)
- Africa > Mozambique > Gaza Province > Xai-Xai (0.04)
- Information Technology > Security & Privacy (1.00)
- Government > Military > Cyberwarfare (0.89)
- North America > United States > California > Alameda County > Berkeley (0.04)
- Asia > Middle East > Jordan (0.04)
- Asia > Japan > Honshū > Chūbu > Toyama Prefecture > Toyama (0.04)
- Africa (0.04)